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1.  Introduction 

Technical  difacuities  ansing  in  Lde  calcuiauon  of  marginal  postenor  densities  needed  for  Bayesian 
inference  have  long  served  as  an  impediment  to  the  wider  application  of  the  Bayesian  framework  to  real 
data.  In  the  last  few  years  there  have  been  a  number  of  advances  in  numerical  and  analytic  approximation 
techniques  for  such  calculations — see,  for  example,  Naylor  and  Smith  (1982,  1988),  Smith  ei  al  (1985. 
1987),  Tierney  and  Kadane  (1986),  Shaw  (1988),  Geweke  (1988) — but  implementation  of  these  approaches 
c>-picaily  .-equires  saphisucated  numerical  or  analytic  approximation  expertise  and  possibly  specialist 
software.  In  a  recent  paper,  Gelfand  and  Smith  (1988)  described  sampling  based  approaches  for  such  calcu¬ 
lations,  which,  by  conaast,  are  essentially  tnvia.  to  implement,  even  with  limited  computing  resources.  In 
this  previous  paper  we  entered  caveats  regarding  the  computational  efficiency  of  such  sampling  based 
approaches,  but  our  continuing  investigations  have  shown  that  adaptive,  iterative  sampling  achieved  through 
the  Gibbs  sampler  (Ceman  and  Geman,  1984)  is.  in  fact,  surpnsingiy  efficient,  converging  remarkably 
quickly  for  a  wide  range  of  problems. 

Our  objective  in  this  paper  is  to  provide  illustrations  of  a  range  of  applications  of  the  Gibbs  sampler 
in  order  to  demonstrate  its  versatility  and  ease  of  implementation  in  practice.  We  begin  by  briefly  review¬ 
ing  the  Gibbs  sampler  in  Section  2.  In  Section  3.  based  upon  computational  expenence  with  a  vanety  of 
problems,  we  offer  several  suggestions  on  assessing  the  convergence  of  this  iterative  algorithm.  In  Section 
-1  we  begin  our  illustrative  analysis  with  a  variance  components  model  applied  to  a  ‘nasty’  data  set  intro¬ 
duced  in  Box  and  Tiao  (1973),  whose  Bayesian  analysis  therein  invoived  elaborate  exact  and  asymptotic 
methods.  In  addition,  we  illustrate  the  ease  with  which  inferences  for  functions  of  parameters,  such  as 
ratios,  can  be  made  using  Gibbs  sampling.  In  Section  5,  we  take  up  the  k-sample  normal  means  problem  in 
the  general  case  of  unbalanced  data  with  unknown  population  variances.  In  particular,  we  show  that  the 
previously  inaccessible  case  where  the  population  means  are  ordered  is  straightforwardly  handled  through 
Gibbs  sampling.  Application  is  made  to  an  unbalanced  generated  data  set  from  normal  populations  with 
known  ordered  means  and  severely  non-homogeneous  variances.  In  Section  6,  we  look  at  a  population 

C»4 

linear  growth  curve  model,  as  an  illustration  of  the  power  of  the  Gibbs  sampler  in  handling  complex 

/or 

hierarchical  models.  We  analyse  data  on  the  response  over  time  of  30  rats  to  a  treatment,  with  a  total  of  66 


.r.voivir.g  me  ccmpar.sor.  of  fA-o  drug  formulations,  m  order  :o  illustrate  tr.e  ease  'Aim  which  the  Gibbs 
iampier  aeols  with  complications  ar.sing  from  missing  data  in  an  originally  balanced  design.  .A.  summar;.- 
discussion  is  provided  in  Section  S. 

2.  Gibbs  sampling 

In  the  sequel,  densities  will  be  denoted,  genencaily,  by  square  brackets  so  diat  joint,  conaitionai  ana 
■Tiarginal  forms  appear,  .-espectively.  as  [.Y.y],  [,X\Y]  and  [Y],  The  usual  marginalisation  bv  integration 
procedure  will  be  denoted  by  forms  such  as 

[X]  =  j[X\Y]-[Y]. 

Throughout,  we  shall  be  dealing  with  collections  of  random  variables  for  which  it  is  known  (see.  for  exam¬ 
ple.  Besag,  1974)  that  specincation  of  all  full  conditional  distributions  uniquely  determines  the  full  joint 
density.  .More  precisely,  for  such  a  collection  of  random  variables  f/, ,  D’;,...,  L't,  the  joint  density, 

. L\],  is  uniquely  determined  by  r  ^  s\,  s  =  1.2 . k.  Our  interest  is  in  the  marginal 

aistribuDons.  [U,],  s  =  1,2 . c. 

■An  algorithm  for  extracung  marginal  distributions  fre  th :  .'uil  conditional  distribution  was  formally 
introduced  as  the  Gibbs  sampler  in  Geman  and  Geman  (1984).  fhe  algorithm  requires  ail  the  full  condl- 
uonai  distributions  to  be  ‘available’  for  sampling,  where  ‘available’  is  taken  to  mean  that,  for  example,  U, 
can  be  generated  straightforwardly  and  efficiently  given  specified  values  of  the  conditioning  variables.  LG, 
r  X  s. 

Gibbs  sampling  is  a  .Markovian  updating  scheme  which  proceeds  as  follows.  Given  an  arbitrary  start¬ 
ing  set  of  values  we  draw  Ul'*  from  [f/ilLj®^ . 0'*°’],  then  L"^'*  from 

.'LG  jL'i' L^3°’ . ...  and  so  on  up  to  Lf*”  from  [LGlL'i'” . LVi’i]  to  complete  one  iteration  of  t.he 

scheme.  .After  t  such  iterations  we  would  amve  at  (L^J'* . L'*'’).  Geman  and  Geman  show  under  mild 

conditions  that  t//’ - -  U.  -  :L,.  j  as  r  — >•  <».  Thus,  for  t  large  enough  we  can  regard  U)‘^  as  a  simulated 


'ser.auca  :rcm 


Repilcaung  this  process  •n  ti.T.es  produces  m  lid  i-cupies  _ 


rcr  anv  s,  tn. 


soilecuon  . oVm  viewed  as  a  simulated  sample  from  [Li;],  .ne  marginal  density  is  then 

:samated  bv  the  finite  mixture  density 


'oj  =  m~'  ^  [U;\U,  =  UlP.  r  ~ 
=  1 


See  Geifand  and  Smith.  IvSS,  for  further  discussion.) 

Since  the  expression  i  i';  can  be  viewed  as  a  Rao-Slackweilized’  density  estimator,  relative  to  the 

more  usual  kernel  density  estimators  based  upon  U'.'.'.j  =  1 . m.  estimation  efficiency  is  high  and  we  find 

m  -  ICO  Cat  most  200)  to  be  adequate  in  practice  as  a  converged  sample  size  on  which  to  base  the  marginal 
density  estimate. 


Suppose  interest  centres  on  the  marginal  distribution  for  a  variable  V  which  is  a  function  g(C/i . U^) 

of  U, . 6't.  We  note  that  evaluation  of  g  at  each  of  the  {U{‘/ . provides  samples  of  V.  In  this 

case,  a  density  esumate  of  the  form  il)  is  not  available,  but  an  ordinary  kernel  density  estimate  can  readily 
be  calculated  (see  Secuon  4  for  an  illustration  of  this). 


•All  applications  we  consider  in  this  paper  are  within  the  Bayesian  framework,  where  the  U,  are  unob¬ 
servable,  representing  either  parameters  or  missing  data  (and  V  can  thus  be  a  function  of  the  parameters 
which  we  are  interested  in).  .All  distributions  will  be  viewed  as  conditional  on  the  observed  data,  whence 
marginal  distributions  become  the  marginal  posteriors  needed  for  Bayesian  inference  or  prediction. 


3.  Convergence  diagnostics 

Our  expenence  thus  far,  in  a  variety  of  real  and  simulated  data  analyses,  shows  remarkably  rapid 
convergence  of  the  Gibbs  sampler.  In  acquiring  this  computational  experience,  we  have  experimented  with 
a  vanety  of  diagnostic  tools  to  facilitate  concluding  whether  or  not  the  algorithm  has  'converged'.  Gen¬ 
erally,  me  most  natural  and  least  sophisticated  approach  has  been  the  most  useful.  Namely,  for  a  fixed  m 


■■'a  ir.craasa  overlay  piots  c:  :r.e  resaiunc  esumatea  donsiaes.  aaa  see  i:  Lhe  esamaies  are  eaaivaier.i 
racier  a  .dick  :ei:  tip  pea'  test.  Siaaiiariy.  -A-e  also  increase  m  at  lixed  i  to  assess  stabiiitv  ci'  die  density 
esamate.  ia  our  experimentation  'xe  increase  :  in  muiuples  ot  10  never  having  required  :  >  eO.  'A'e  tvpi- 
caiiy  set  .-n  =  50  or  m  =  iCO.  Requisite  generauon,  even  tor  larger  t  and  m.  is  usually  neither  costly  acr 
I  low.  Univariate  piots  are  drawn  by  selecting  -0  equally  spaced  points  in  the  er'iective  domain  ot'  the  vari¬ 
able.  dVe  then  evaluate  the  density  esumate  (of  the  form  (1))  at  tnese  points  and  a  spline-smootned  curve  is 
drawn  uhrough  these  values.  3y  eifecuve  domain,  we  mean  me  interval  where,  say.  of  the  mass  lies. 
'■Ve  occasionally  require  several  passes  to  determine  this  domain  and  occasionally  require  man  -ij 

points  to  obtain  a  sausfying  plot.  Cleany.  this  piotung  method  ccuid  be  reaned,  but  sucn  issues  are  not  me 
main  concern  of  this  paper.  In  this  regard,  we  also  recommend  a  convenient  check  on  calculations  by  using 
a  simple  trapezoidal  integration  on  tne  collection  of  esumated  density  values  to  see  how  close  me  result  is 
to  1. 


.Vlonitonng  across  iterations  of  summary  statistics  such  as  sample  moments  or  quanules  has  not  pro¬ 
ven  effecuve.  If  we  successively  study  differences  or  relative  differences  in  such  statistics,  it  is  not  easy  to 
assess  when  these  quantities  are  stable.  Calculation  of  standard  errors  for  such  differences  is  difficult  in 
part  due  to  the  unknown  dependence  structure  between  successive  iterations  taithougn  comparison  of  itera¬ 
tions,  say  10  apart,  wiil  mitigate  the  dependence  issue).  Sample  reuse  methods  might  be  tned.  .mowever. 
.'Other  than  a  comparison  of  a  few  summary  statistics,  we  prefer  an  overall  distributional  comparison 
between  iterates. 


An  attractive  graphical  tool,  which  we  have  found  very  useful,  is  the  empirical  quantile-quantile  plot. 
With  m  constant  across  iterations,  such  plots  are  easily  obtained.  One  only  need  order  the  generated  sam¬ 
ples  at  the  iterations  to  be  plotted.  Using  m  =  100  fperhaps  2001,  under  convergence  the  plotted  points 
should  generally  be  close  to  the  45°  line.  Creating  such  displays  over  increasing  numbers  of  iterauons 
enables  us  to  distinguish  inherent  variation  from  lack  of  convergence.  Such  displays,  with  their  inherent 
variation,  accord  with  the  aforementioned  ‘thick  felt-tip  pen’  comparison.  We  offer  illustrative  displays  in 
scnmnction  with  the  variance  components  example  of  Section  4. 


'■\'e  .r.cr.Lica  s  :;r.a;  coin:  ^r.icn  :s  pcrur.er.t  :o  :ne  assessmcr.t  or  ocnvcrgor.co.  .  .\c  Niarkoviar.  r.arure 
::  Lho  '.'.orativc  orccess  rr.eans  -’'.ar  apparent  ccnveraer.ee  can  be  temporaniy  penurDcd  bv  an  unrypical' 
ra.Tipie.  Thrs  can  upset  our  crac.'.csacs  ibr  per.naps  several  subsequent  iterauons  until  stability  is  restored. 
.\n  obvious  way  to  'robustify'  tee  process  is  to  work  with  pooled  successive  samples  within  some  moving 
vindow.  Again,  we  s.bail  not  rccas  on  such  reanements  in  this  paper. 


a.  Variance  comnonent  problems 

Random  ett'ects  models  zrz  ve.-/  naturailv  modelled  wiinin  the  Bayesian  t'ramework.  .Vonetheiess. 
calculation  or  die  marginal  poster.or  distnbuuons  ot  variance  components  and  functions  of  variance  com¬ 
ponents  has  proved  a  challenging  technical  problem.  Box  and  Tiao  (.1973)  report  a  subsunual  amount  of 
detailed,  sophisticated  approximauon  work,  both  analytic  and  numerical.  Skene  1 19S3)  considers  purpose- 
built  nu.mericai  techniques.  The  methods  described  by  Smith  et  a:  (T985.  1987)  require  careful  reparametri- 
cation  dependent  upon  both  the  data  and  the  choice  of  prior.  In  a  similar  spint,  .Achcar  and  Smith  (1988) 
discuss  parameter  u-ansformations  for  successful  impiementauon  of  Laplace's  method  (Tierney  and  Kadane. 
1986).  By  comparison,  the  Gibbs  sampling  approach  is  remarkably  simple. 

■'Ve  shall  illustrate  the  approacn  with  a  model  involving  oniy  two  variance  components,  but  it  will  be 
clear  that  the  development  for  more  complicated  models  is  no  more  difficult.  Consider,  then,  the  variance 
components  moaei  defined  by 


where,  assuming  conditional  independence  throughout,  =  .Vf/r.o-g)  and  =  VTO.cr*).  Let 

9  =  10, . 9^'),  Y  =  (Jii . Y/cj)  and  assume  that  ^t, C7", <T“  are  independent  with  pnors  [/r]  =  iVIjUo.cTq), 

[cri]  =  IG(ax,b{),  [o’*]  =  IGia-^/o^)  (see,  for  example.  Hill.  1965),  where  IG  denotes  the  inverse  gamma 
distnbuuon  and  Uo,0(P,j|,hi,a2,02  assumed  known.  It  is  then  straightforward  to  verify  that  the  Gibbs 


[al\Y.;j..9.c~]  =  IG\a,  ~  ^K,  b,  -il{9,-ur) 

[a;  \  r.pt,  9.  g'^  ;  =  IGia.  -  -O.  o,  -  tZZiY,,  -  9,  )*) 


sampler  is  specified  by: 


•  - ro  —  -  ^ - 


\  cr^'-rA'cT'  :j^-Ka-, 


I  J  <7  a  -  7  ~ 

)\y.-a,c^.c!:\  =  ,v  —.— -.  --■  ai.  - 

\jGQ-rOg  j7^-rG^ 


^s<j:  ., 

- : - -1  ). 

ag^a~  i 


■A.'here  Y  =  iT, . y^),  >'  =  ~Y,JJ,  1  is  a  A'xl  column  vector  of  I’s  and  /  is  a  KxK  identity  matrix.  In 

particular,  in  (3),  we  can  allow  the  a.  and/or  b:  equal  to  zero,  representine  a  ranee  of  conventional  improper 
pnors  for  cTf  and  <J~. 

Box  and  Tiao  (.1973.  Secuon  5.1.31  introduce  two  data  sets  for  which  tnc  model  (Z)  is  appropriate. 
The  second  and  more  difficult  set  is  generated  from  random  normal  deviates  with  u  =  5.  =  -i.  cry  =  16. 

The  resultant  data,  summarised  in  Table  1,  are  badly  behaved,  in  that  the  standard  ('.-AN'OVA  based) 
unbiased  estimate  of  cry  is  negative,  rendering  inference  about  cry  difficult.  We  shall  use  this  example  to 
provide  a  challenging  low  dimensional  test  of  the  Gibbs  sampler. 

Table  1:  Generated  Data  (Box  and  Tiao.  1973.  p  2471 

A  =  6  7  =  5  F  =  5.6656 


Batch 


1 


Y 


6.2268  4,6560  7.5212  5.6848  6.0796  3.8252 

8.8650  25.4900  25.6359  ".0935  14.3590  S.2691 


Source 

SS  df 

MS 

Between  Batches 

41.6816  5 

8.3363 

Within  Batches 

358.7014  24 

14.9459 

Total 

400.3830  29 

(7y  =  14.9459  d-y=  -1.3219 


For  lilustniuve  p'.:rposes.  ■•‘.■e  provide  a  Bayesian  aroiysis  basea  on  me  pr.ar  speciucaiiun 
\~7\  =  /Gi'0,0),  [p.]  =  .V'0, 10'“',  together  with  either 

/:  [ci]  =  IG(O.O)  or  If:  [<7^^]  =  IG(i.bi). 

Under  /,  we  have  the  improper  prior  for  suggested  by  Hill  (1965),  which  is  a  naive  two  dimen¬ 

sional  extension  of  the  familiar  non-informauve  pnor  for  a  vanance.  Under  II.  we  have  a  proper  weak 
indepencent  inverse  chi  square  pnor  for  Cif  which,  depending  upon  b<.  ‘supports'  or  'differs  from'  the  aata 
see  Skene,  1976  for  further  detailed  discussion).  The  two  pnors  for  cr^  differ  considerably.  Under  /  [cr^ ] 
■.s  one-tailed.  giving  strong  weight  to  the  assertion  that  cr,  is  near  zero.  As  this  is  weaxiy  coniirmed  by  me 
data,  the  marginal  postenor  (Pigure  la)  reiiects  this  prior.  Under  II,  [crj^l  is  two-tailed,  having  mode  at 
Zbu2.  Interestingly,  expenmentation  with  b^  varying  up  to  6  leads  to  an  outcome  similar  to  that  under  /. 
For  all  such  dj,  the  prior  is  virtually  reproduced  as  the  posterior  (sec  Figure  2a  for  the  case  b^  =  1).  The 
data  provide  very  little  information  about  cr|. 

Our  experience  with  Gibbs  sampling  in  this  context  is  very  encouraging.  Under  both  /  and  II  (with 
b^  =  11,  the  iteraave  approach  had  no  difficulty  with  the  extreme  skewness  in  the  resultant  posterior  of  o’l’. 
Overall  convergence  was  achieved  under  /  within  at  most  20  iterations  using  m  =  iOO.  ana  under  II  within 
at  most  10  iterations  using  m  =  100.  We  demonstrate  this  in  Figure  1,  which,  for  case  I,  compares  density 
estimates  after  20  and  40  iterations  for  <J~  and  cr^.  Figure  2  presents  the  corresponding  curv'es  for  case  II 
after  10  and  20  iterations.  In  Figures  3  and  4,  we  show  several  empirical  Q-Q  plots  for  cr“  and  Cg. 
respecuvely,  under  II,  again  using  m  =  100  points.  We  compare  the  first  iteration  with  the  second,  the 
second  with  the  third,  the  third  with  the  founh  and  at  ‘convergence’ — the  ninth  with  the  tenth.  Note  that 
for  CT"  we  essentially  have  convergence  at  the  third  iteration. 

The  vanance  ratio,  cr|/c7",  or  perhaps  the  intra-class  correlation  coefficient  Cg  Hal  ^a~)  are  often 
quantities  of  interest.  Remarks  at  the  end  of  Section  2  show  that  obtaining  the  marginal  posterior  distribu¬ 
tion  for  such  variables  is  easily  accomplished.  Figure  5  shows  the  estimated  density  for  the  variance  ratio 
under  both  I  and  II  obtained  after  20  iterauons  with  m  =  1000,  the  untypicaily  large  value  of  m  arisin  • 
from  the  uwkward  shape  of  the  postenor.  A  density  estimator  with  normal  kernels  was  used  with  window 


'■.dLh  suggested  ;p.  iiiverman  i  i9S6  p.-S'i. 

5.  Normal  means  problems 

The  comparison  oi  means  presumed  from  normal  populations  is  arguably  the  most  ubiquitous  model 
:n  stausucal  inference,  but  issues  such  as  unbalanced  sampling  and  heterogeneity  of  variances  have  p/pi- 
cally  forced  compromises  in  irequentist  and  empincal  Bayes  approaches.  Kistoncally,  this  has  also  been 
somewhat  true  in  the  purely  Bayesian  setting.  Frequently,  with  regard  to  variance  parameters  the  proper 
Bayesian  procedure  of  marginaiisauon  oy  integrauon  has  been  replaced  by  point  esumation.  in  order  to 
reduce  the  dimensionality  of  numencal  integrations  needed  to  obtain  marginal  postenor  distnbutions  for 
mean  parameters.  Gibbs  sampling  provides  a  means  of  performing  such  integrations  without  having  to 
make  approximauons.  The  Gibbs  sampler  was  introduced  in  the  context  of  problems  of  very  high  dimen¬ 
sion  (such  as  image  reconstruction,  expert  systems,  neural  networks)  and  has  been  spectacularly  successful 
in  such  contexts.  Its  encouraging  performance  in  our  investigations  is  therefore  not  surprising  since  even  a 
large  muitiparameter  Bayesian  problem  is  of  small  dimension  compared  to  typical  image  processing  prob¬ 
lems. 

In  mis  section,  we  consider  the  comparison  of  /  population  means  which,  in  conjunction  with  distinct 
unknown  population  variances  and  an  exchangeable  prior,  results  in  a  21  +  2  parameter  problem.  We  show 
that  the  implementation  of  the  Gibbs  sampler  is  straightforward.  The  more  general  case  v'here  the  popula¬ 
tion  means  are  represented  as  linear  functions  of  a  set  of  explanatory  variables  can  be  handled  similarly 
using  by  now  familiar  distribution  theory  given  in,  for  example,  Lindley  and  Smith  (1972).  Such  an  exam¬ 
ple,  involving  66  parameters,  appears  in  Section  6. 

Often  there  are  implicit  order  restrictions  on  the  means  to  be  compared.  For  instance,  it  may  be 
known  that  the  means  are  increasing  as  we  traverse  the  populations  from  i  =  I  up  to  I.  If  we  incorporate 
this  information  into  our  prior  specification  using  order  statistics,  the  integrations  required  for  marginalisa¬ 
tion  are  typically  beyond  the  capacity  of  current  numerical  and  analytic  approximation  methodology.  How¬ 
ever,  as  we  show  below,  the  Gibbs  sampler  is  still  straightforwardly  implemented  since  normal  fui! 


-'cr.ciucr.j' ;  ii.T.Diy  replaced  bv  ircacated  nerraars. 

The  requisite  discnbuuon  theory  assuming  no  order  restnctions  on  the  means  is  as  I'ollows.  Assuming 

eondiuonai  independence  throughout,  let  [ I'.,- 1 , <Tf  ]  =  .VY5,-,(7")  i=  1 . ;=  i . = 

'erfj  =  /Ci'u;,di),  [;i]  =  and  (r']  =  !G(a^,b2).  where  iG  denotes  the  Inverse  Gamma 

dismbuuon  and  a, ,  J2. assumed  known  (often  chosen  to  represent  eonvenuenai  improper 
rr.cr  ferms;  see  Secuon  3y  sufficiency,  we  confine  attenuon  to  =  r:’. ana  dp  = 

T.':', Letung  6  =  ^  di . 9/'),  <y~  =  icrr . cf  )  ana  Y  =  . IT,  if . ir),  we  have,  tor 

given  data  Y.  the  following  full  condiuonai  distnbutions. 

id\Y.(j-.u.-r]  =  S(e*.D"), 


wnere 


Q’-  -  ''iyj-^^uerr 


and 


D‘  = 


<y.-r~ 

n,r+cr.* 


D‘  =  0.  i  *  /. 


wnere 


[crhY,.,S.\d,]  =  IG(a,^in^.t?^^i  Y 


[/iiy.e.o'.r*]  =  /VI 


/i/j  (Tr)  ^9  i  r”  0*0 


-/ctq- 


r^|y,5,cr.;z]  =  /Ct'a^-ri/.  b2^z^(9i-pLr). 


Suppose  now  that  the  means  are  known  to  be  ordered,  say,  9^  <  9-^  <  ...  <  9, .  If  we  assume  as  our  pnor 
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•.hat  the  c^,  anse  as  order  statistics  trom  a  sample  of  size  /  from  then  it  is  straightfor.vard  to  show 

that  [(9.  j  J',  t?,,;  ?=  i,  cr./tt,  r"]  is  now  precisely  the  marginal  normal  dismbution  in  (4),  but  restricted  to  the 
inter\-ai  [»:-!.  ^.-iJ  twhere  we  adopt  ’die  convenuon  ^!~i  =  4nd  so  again  is  straightfor- 

•aardly  available  for  sampling.  The  mil  conditional  distributions  for  <j~.  r~  and  fi  remain  exactly  as  above. 

In  sampling  from  the  truncated  normal  distribution,  the  rejection  method  (discarding  ineligible  obser¬ 
vations  sampled  from  the  r.on-iruncated  distribution!  -a/iH  tend  to  be  wasteful  and  slow,  parucuiariy  if 
IS  small.  To  dra'A/  an  observation  from  N(c.d~)  restricted  to  {a,b)  a  convenient  'one-for-one' 
sampling  method  is  the  foilo-A/ing  ((Devroye,  1986).  Generate  U,  a  random  uniform  (0, 1)  variate  and  calcu¬ 
late  Y  =  u.j.c.a')),  where 

■A/ith  0  denoting  the  standard  normal  cdf.  It  is  straightforward  to  show  that  Y  has  the  desired  distribution. 
These  ideas  are  easily  extended  to  give  a  general  account  of  Bayesian  analysis  for  order  restricted  parame¬ 
ters,  but  details  will  be  deferred  to  a  subsequent  paper. 

In  order  to  study  the  performance  of  Gibbs  sampling  in  the  above  setting,  we  analysed  generated  data 
so  as  to  be  able  to  calibrate  the  results  against  the  known  situation.  For  the  purpose  of  illustration,  we 
created  a  rather  unbalanced,  extremely  non-homogeneous,  data  set  by  setting  /  =  5  and  for  the  ith  popula¬ 
tion.  1  =  1 . 5  drawing  /i,  =  21-1-4  independent  observations  from  N(i,r).  The  simulated  data  is  summar¬ 

ised  in  Table  2,  and  we  note,  in  particular,  the  inversion  of  order  of  the  sample  means,  Ti  and  Y<. 


Table  2;  Summary  of  simulated  data  for  Normal  Means  problem 


1 

Sample 

1 

2 

3 

4 

5 

i 

1  ''' 

6 

8 

10 

12 

14 

Y, 

0.3191 

2.034 

3.539 

6.398 

4.811 

Sr 

0.2356 

2.471 

5.761 

8.758 

19.670 

For  lilustrauon,  'vs  sp^crded  priors  [/^j  =  .ViO.  ICP).  [cr."]  =  !G(i,  1)  and  [7~]  =  /oii,  Ij.  For  the 
Gibbs  sampler,  convergence  was  achieved  within  ten  iterations  tor  the  unordered  case  using  m  =  100.  The 
ordered  case  required  at  most  twenty  iterations,  again  using  m  =  100,  except  for  [di\Y]  and  [9^\Y]  which 
both  required  m  =  1000.  Rather  than  graphically  documenting  the  convergence  in  this  case,  we  compare 
the  unordered  and  ordered  marginal  posteriors.  Let  [9-\Y]^  and  [9,\Y]q  denote,  respectively,  the  unor¬ 
dered  and  ordered  density  estimates.  In  Figure  6a  we  consider,  for  example,  (?2  and  see  that  and 

have  roughly  the  same  mode  but  Lhat  [92\Y]q  is  less  dispersed.  Utilizing  the  order  information 
results  in  a  sharper  inference.  In  Figure  6b,  we  consider  both  9^  and  9^.  As  would  be  expected,  given  the 
sufficient  statistics,  [9^\Y]^  lies  to  the  left  of  [9i\Y]^  and  is  verx'  dispersed.  Utilizing  the  order  informa¬ 
tion  places  [flilTlo  and  [flfiyio  in  the  proper  stochastic  order,  pulls  the  modes  in  the  correct  direction  and 
reduces  dispersion. 


6.  An  hierarchical  model 

Applications  of  hierarchical  models  of  the  kind  introduced  by  Lindley  and  Smith  >1972)  abound  in 
fields  as  diverse  as  educational  testing  (Rubin,  1981),  cancer  studies  (DuMouchel  and  Hams,  1983)  and 
biological  growth  curves  (Strenio,  Weisberg  and  Bryk.  1983).  However,  both  Bayesian  and  empirical  Baye¬ 
sian  methodologies  for  such  models  are  typically  forced  to  invoke  a  number  of  approximations,  whose 
consequences  are  often  unclear  under  the  multiparameter  likelihoods  induced  by  the  modelling.  See,  for 
example,  Morris  (1983),  Racine-Poon  (1985)  and  Racine-Poon  and  Smith  (1989)  for  details  of  some 
approaches  to  implem'’nting  h'crarchicaJ  model  analysis.  By  contrast,  a  full  implementation  of  the  Baye¬ 
sian  approach  is  t- .  achieved  using  the  Gibbs  sampler,  at  least  for  the  widely  used  normal  linear 
hierarchical  model  stnic'i  ,,- 

For  illustration,  we  focus  on  the  following  population  growth  problem.  In  a  study  conducted  by  the 
CIBA-GEIGY  company,  the  weights  of  thirty  young  rats  were  measured  weekly,  for  five  weeks.  The  data 
are  given  in  Table  3,  with  weight  measurements  available  for  all  five  weeks. 
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Table  3:  Rat  population  growth  data 


Rat 

•T3 

^i5 

Rat 

^.2 

•’^.3 

Xj  4 

^,5 

1 

151 

199 

246 

283 

320 

16 

160 

207 

248 

288 

324 

T 

145 

199 

249 

293 

354 

17 

142 

187 

234 

280 

316 

; 

147 

214 

263 

312 

328 

18 

156 

203 

J 

283 

317 

a 

155 

200 

237 

272 

297 

19 

157 

212 

259 

307 

336 

5 

135 

188 

230 

280 

jua 

20 

152 

203 

246 

286 

321 

6 

159 

210 

252 

298 

331 

21 

154 

205 

253 

298 

334 

7 

141 

189 

231 

275 

305 

^2 

139 

190 

225 

267 

302 

8 

159 

201 

248 

297 

338 

23 

146 

191 

2'^9 

272 

302 

9 

177 

236 

285 

340 

376 

24 

157 

211 

250 

285 

323 

10 

134 

182 

220 

260 

296 

25 

132 

185 

237 

286 

331 

11 

160 

208 

261 

313 

352 

26 

160 

207 

257 

303 

345 

12 

143 

188 

220 

273 

314 

27 

169 

216 

261 

295 

333 

13 

154 

200 

244 

289 

325 

28 

157 

205 

248 

289 

316 

14 

171 

221 

270 

326 

358 

29 

137 

180 

219 

258 

291 

15 

163 

216 

242 

281 

312 

30 

153 

200 

244 

286 

324 

= 

8.  X,2 

=  15, 

^i‘i  ~ 

22.  j:,  4 

=  29,  x,5  = 

36  days,  t  = 

1 . 

30. 

For  the  time  period  considered,  it  is  reasonable  to  assume  individual  straight-line  growth  curves  so  that, 
under  homoscedastic  normal  measurement  errors. 

Yij  -  N(ai+I5,x,j,a‘)  (i  =  1 . j  =  1 . n.) 

provides  the  full  measurement  model  (with  k  =  30.  n,  =  5,  and  j:,y  denoting  the  age  in  days  of  the  ith  rat 
when  measurement  j  was  taken).  The  population  structure  is  modelled  as 


(i  =  1 . k) 


asiurr.inc  conditionai  independence  ihroughout.  A  mil  Bayesian  analysis  now  requires  die  speciacauon  ot  a 
prior  ;cr  cr",  jj.  -  ioro,/)o)  and  E.  Typical  inferences  of  interest  in  such  studies  include  marginal  posteriors 
for  die  population  prarameters  oio./Jo  predictive  intervals  for  individual  future  growth  given  the  first 
week  measurement.  We  shall  see  that  these  are  easily  obtained  using  the  Gibbs  sampler. 

For  the  pnor  specification,  we  take 

:o  have  a  normai-Wishart-inverse-gamma  form, 

[n]  =  N{-nx) 

[Z-'l  =  W{{pR)-\p) 

Rewriting  the  measurement  model  for  the  ith  individual  as  Y,  -  N{Xidi,a~I^)  with  0.  =  (a, ■,/?,)'  and  X, 
denoting  the  appropnate  design  matrix,  and  defining 

^  =  (^1 . YkV,  9  =  k-^i9„  n=in,, 

.  =  1  ;=i 

D.  =  <7~^Xi'^Xi-r£-' 

V  = 

the  Gibbs  sampler  for  9  =  {9i . 0*),  Z,  cX  (a  total  of  66  parameters  in  the  above  example)  is  straightfor¬ 

wardly  seen  to  be  specified  by  the  following  conditional  distributions: 

[G;\Yx,^~\<y^]  =N[D,{c-^X^Y,^Z-'p),Di]  (1=1 . k.)  (5) 

[p\Y,[9\,Z'\<y~\  =  N[V{kZ-'9^C-'Ti),V] 

[r-‘l}',[0),/r,a-]  =  W{[Z(9,-p)(9,-pf*pRr'.k  +  p] 

I 


[<j-l r,  =  /g(  ^ .  \ {i{Y,-x,d,)'{Y.-x,9,) - -o j 


-or  the  analysis  of  the  rat  growth  data  given  above,  the  prior  specification  was  defined  b> 


C~‘  =  0,  v'o  =  0.  p  =  2,  R  = 


100  0  'i 

0  0.1  /’ 


.-efiecting  rather  vague  initial  information  relative  to  that  to  be  provided  by  the  data.  Simulation  from  the 
'.Vishart  distnbution  for  the  2x2  .matnx  is  easily  accomplished  using  the  algorithm  of  Odell  and  Feive- 
son  1 1966'):  with  G(.,.)  denoting  gamma  distributions,  draw  independently  from 


li/i]  =  [C/,]  =  G(^,:i].  [.V]  =  ,V(0. 1): 


Ux  nW\ 
NiU[  U.+N^  ' 


then  if  5"'  = 


X"’  =  -  W(S~'.v). 


The  iterative  process  can  be  conveniently  monitored  by  observing  Q-Q  plots  for  ccrj.PQ.cr  and  the  eigen¬ 
values  of  £~'’.  For  the  data  set  summarized  in  Table  3,  convergence  is  achieved  with  about  35  cycles  of 
in  =  50  drawings. 

As  we  remarked  earlier,  full  Bayesian  analysis  of  structured  hierarchical  models  involving  covariates 
has  hitherto  presented  difficulties  and  a  number  of  Bayes/empirical  Bayes  approximation  methods  have  been 
proposed.  Racine-Poon  and  Smith  (1989)  review  a  number  of  these  and  demonstrate,  with  a  range  of  real 
and  simulated  data  analyses,  that  the  EM-type  algorithm  given  by  Racine-Poon  (1985)  seems  to  be  the  best 
of  these  proposed  approximations.  However,  it  can  be  seen  from  Figure  7,  where  we  present  the  estimated 
postenor  marginals  for  the  population  parameters,  that,  even  with  this  fairly  substantial  data  set  of  30x5 
observations,  the  EM-type  approximation  is  not  really  an  adequate  substitute  for  the  more  refined  numerical 
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jpproxi.'TiaQon  provided  by  ihe  Giobs  sampier.  '.Kere.  [he  EM-based  ‘posterior  density'  is  the  normal  condi¬ 
tional  :onn  (5)  with  the  converged  estimates  from  the  Racine-Poon  algorithm  substituted  for  the  condition¬ 
ing  parameters.) 

To  further  underline  the  effectiveness  of  the  Gibbs  sampler,  and  the  danger  of  point-estimation  based 
approximations  in  hierarchical  models,  we  reanalysed  two  subsets  of  the  complete  data  set  of  150  observa¬ 
tions  given  in  Table  3.  chosen  to  present  an  increasing  challenge  to  the  algorithms.  One  s-:b:c:  consisted  of 
90  obser/auons,  obtained  by  omitting  die  hnai  data  point  from  rats  6-10,  the  nnai  two  data  points  from  rats 
1 1-20,  the  anal  three  from  rats  21-2'  and  the  tinal  four  from  rats  26-30.  The  other  subset  consisted  of  75 
observadons,  obtained  from  the  90  by  retaining  only  one  of  the  observations  for  each  of  the  rats  16-30. 
Convergence  for  the  first  subset  required  about  50  iteradons  of  m  =  50;  convergence  for  the  second  about 
65  iteradons  of  m  =  50. 

Figure  8  summarizes  the  marginal  posteriors  for  the  growth  rate  parameter  obtained  for  the  two  data 
subsets  from  the  Gibbs  and  EM-type  algorithms,  respecdvely.  It  can  be  seen  that  while  the  EM  approxima¬ 
tion  is  perhaps  tolerable  for  the  full  data  set  (Figure  7),  it  is  very  poor  for  the  smaller  data  sets. 

Consider  now  the  data  set  of  90  observadons  and  suppose  that  the  problem  of  interest  is  the  predic¬ 
tion  of  the  future  growth  pattern  for  one  of  the  rats  for  which  there  is  currently  just  the  first  observation 

available  (i  =  26 . 30).  Specifically,  suppose  we  consider  predicting  j  =  2. 3, 4, 5,  corresponding  to 

x,2  =  15,  x,3  =  22,  X.4  =  29,  x,}  =  36  days.  Then,  formally, 

[T.yiy]  =  /  [y.,ldi.o^]*[0,.a=irh 


where 


An  esdmate  of  [Yij\Y]  of  the  form  (1)  is  thus  easily  obtained  by  averaging  [T.yl^i.cr]  over  pairs  of 
(9i,a~)  obtained  at  the  final  cycle  of  the  Gibbs  sampler.  Figure  9  shows,  for  i  =  26,  bands  drawn  through 
the  individual  95%  predicdve  interval  limits  calculated  at  days  15.  22,  29  and  36,  together  with  the  subse¬ 
quently  observed  values  at  those  points. 
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Altemauvely,  we  could  view  the  omitted  or.  in  general,  as  yet  unobserved  data  points  as  missing  data. 
The  Gibbs  sampler  could  then  be  implemented  treaung  such  Y,j  as  unobservable  (in  addition  to  the  model 
parameters)  since  the  required  full  conditional  distributions  have  the  form  (6). 

7.  Missing  data  in  a  cross-over  trial 

The  balanced  two-period  cross-over  design  is  widely  used;  for  example,  in  the  pharmaceutical  indus¬ 
try  for  bioequivalence  studies  involving  a  standard  and  a  new  drug  formulation.  A  and  B,  say  (Racine  et  ai. 
1986;  Racine-Poon  et  al,  1987).  Assuming  n  subjects,  the  standard  random  effects  model  for  a  two-period 
cross-over  is  given  by: 

no.) 

where 

no.)  =  response  to  the  ith  subject  (i  =  1 . n)  receiving  the 

yth  formulation  (/  =  1,2)  in  the  /fcth  period  {k  =  1,2); 

/r  =  overall  mean  level  of  response; 

0  =  difference  in  formulation  effects; 

t:  =  difference  in  period  effects; 

5,  =  random  effect  of  ith  subject; 

=  measurement  error. 

The  Si,  are  assuming  independent,  for  all  i,j,k,  with  ~  iV(0,crf),  o,-  -  A^(0, cr2*). 

Suppose  now  that  subjects  i  =  1 . M  have  data  missing  from  one  of  the  two  periods;  subjects 

(  =  Af  +  1 . n  have  complete  data.  We  shall  write 


-  IS  • 


AO. 


where  the  ‘observauons’  within  subject  i  have  been  labelled  such  that  K,-  is  the  observed  data.  O.  is  miss¬ 
ing,  and  X,  defines  the  corresponding  design  matrix.  For  subjects  i  =  M+\ . n,  Yi  =  simply  denotes 

all  the  observed  data.  We  write  U  =  If/, . V  =  (V......  V',)’’.  Then,  conditional  on 


9  =  (ji,  o.rrj ' .  £  = 


M2 


crc 


crc  cr,* 


where  (7-%  =  crr-cT;".  w'e  have 


where 


N(xe,s), 


Here,  Y  is  2/ixl,  X  is  2/1x3  and  5  is  2nx2n. 

It  is  convenient  to  work  with  Q,a\,a^ ,  where  <73*  =  cr, and  to  note  that,  if  we  define 


Y*  =  average  response  of  the  ith  subject 


^(22))  AS 


Y~  =  difference  of  the  two  responses  for  the  ith  subject 


^1(22))  AS 


then: 
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:or  Lhe  .4jS  sequence. 


ror  the  5.4  sequence. 


0 

CTf 


If  we  now  make  the  prior  specification 


It  can  be  seen  that 


[aC.af  jC/.V,  0]  «  (CT|‘)  2  exp - 1-^55 


2<7r 


(o-r) 


■(  2 


exp  - 


:c7r 


x(aj^)  ^sxpj-^sSj 

ZCTj* 


(cr3‘)  ^exp 


V’tTb 

9rr2 


where 


55, 


-2  £  fr,.--(2^)]\2  J 


AB  S4q, 


BA  stq.  ^ 


,T-(9 


12 


553  =  2  f  (C-/2)^. 


1=1 


It  follows  that  a  Gibbs  sampler  for  af ,  <732,  5  and  ff  is  specified  by: 


-  :o  - 


with 


X'^S~^Y  =  X  X^L~'Yi, 
1  =  1 

X'^S~^X  =:  £  Xi'^I'^Xi, 

i  =  l 


D  =  A'^S-‘Ar  +  C'‘, 


[V\V,  9,a^,ai] 


n\ 

\x,9+^-{V-X^9),  errz 

/.vr 

[  <^12 

L  \‘^I2/  J 

with 


Table  4  summarizes  data  from  a  (complete  data)  trial  conducted  with  n  =  10  subjects,  in  which  responses 
are  treated  as  missing  from  subject  1  in  period  1.  Subject  3  in  period  2  and  subject  6  in  period  2. 
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Table  4:  Data  from  a  two-period  cross-over  trial 


Subject 

Sequence 

Period  1 

Period  2 

1 

AB 

1.40 

1.65 

2 

AB 

1.64 

1.57 

3 

BA 

1.44 

1.58 

4 

BA 

1.36 

1.68 

5 

BA 

1.65 

1.69 

6 

AB 

1.08 

1.31 

7 

AB 

1.09 

1.43 

8 

AB 

1.25 

1.44 

9 

BA 

1.25 

1.39 

10 

BA 

1.30 

1.52 

A  =  new  tablet.  B  =  standard  tablet;  formulations  of  Carbamazepine. 

Data  are  observations  of  the  logarithms  of  1.52  maxima  of  concentration-time  curv  es; 
see  Maas  et  at  (1987)  for  background  and  further  details. 

Convergence  was  achieved  within  30  iterations  of  /n  =  50.  Figure  10a  shows  the  marginal  posterior 
density  for  cTf;  Figure  lOb  shows  the  marginal  posterior  density  for  <p,  the  treatment  effect.  The  Gibbs 
sampler  also  automatically  provides  ‘predictive’  densities  for  the  missing  responses;  these  are  shown  in 
Figure  1 1  and  their  locations  may  be  compared  with  the  actual  missing  values.  Finally,  the  ease  with  which 
the  Gibbs  sampler  permits  analysis  of  this  cross-over  model  enables  informative  sensitivity  studies  to  be 
performed.  In  particul^,  we  can  easily  study  the  difference  between  the  treatment  posterior  density  based 
on  the  complete  data,  the  data  omitting  the  assumed  missing  values,  and  the  data  based  on  just  the  seven 
subjects  for  whom  full  data  was  assumed.  The  resulting  posteriors  are  shown  in  Figure  12  and  reveal  a 
typical  finding  in  such  trials.  Namely:  that  if  there  is  missing  (at  random)  data  from  a  subject  we  might  just 


as  weil  ignore  uie  subject  altogether;  also,  that  the  ioss  ot  30‘i^c  of  subjects  in  a  small  trial  results  in  sub¬ 
stantially  increased  inferential  uncertainty. 

8.  Summary  discussion 

The  range  of  normal  data  problems  considered  above  as  illustrations  of  the  ease  with  which  numerical 
Bayesian  inferences  can  be  obtained  via  Gibbs  sampiing.  include  the  I'cilowing  aspects; 

awkward  postenor  distributions,  other.vise  requiring  subtle  and  sophisticated  numerical  or  analytic 

approximation  techniques  (Sections  4  and  5): 

further  distributional  complexity  introduced  by  order  constraints  on  model  parameters  (Section  5); 

dimensionality  problems.  t>'picaily  putting  out  of  reach  the  implementation  of  other  sophisticated 

approximation  techniques  (Section  6); 

messy  and  intractable  distribution  theory  arising  from  missing  data  in  designed  experiments  (Section 

7): 

general  functions  of  model  parameters,  including  so-called  Fieller-Creasy  problems  (Section  4); 

awkward  predictive  inference  (Section  6). 

In  all  these  situations,  we  have  seen  that  the  Gibbs  sampler  approach  is  straightforward  to  specify 
distributionally,  trivial  to  implement  computationally  and  with  output  readily  translated  into  required  infer¬ 
ence  summaries. 

The  potential  of  the  methodology  is  enormous,  rendering  straightforward  the  analysis  of  a  number  of 
problems  hitherto  regarded  as  intractable  from  a  Bayesian  perspective.  Work  is  in  progress  in  extending  the 
range  of  implementation.  First,  by  developing,  where  necessary,  purpose-built  efficient  random  variate  gen¬ 
erators  for  conditional  distribution  forms  arising  in  panicuiar  classes  of  applications;  secondly,  by  facilitat¬ 
ing  the  reporting  of  bivariate  and  conditional  inference  summaries,  in  addition  to  univariate  marginal 


ciir\-es.  '-V'e  plan  lo  repon  shoriiy  on  vanous  of  these  extensions. 
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gure  2:  Convergence  of  Estiaaced  Densities  of  Variance 
Components  Under  Prior  Soecif ication  II 
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gure  8:  Estimated  densities  for  population  growth  rate 
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Figure  9:  Estimated  95%  predictive  intervals  for  furture  observations  (+) 
given  the  first  observation  (x)  of  rat  26  (90  observation  case). 


igure  12  ;  Posterior  densities  of  (p  in  the  cross-over  trial 
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